Neural Taylor Approximations: Convergence and Exploration in Rectifier Networks
نویسندگان
چکیده
Modern convolutional networks, incorporating rectifiers and max-pooling, are neither smooth nor convex; standard guarantees therefore do not apply. Nevertheless, methods from convex optimization such as gradient descent and Adam are widely used as building blocks for deep learning algorithms. This paper provides the first convergence guarantee applicable to modern convnets, which furthermore matches a lower bound for convex nonsmooth functions. The key technical tool is the neural Taylor approximation – a straightforward application of Taylor expansions to neural networks – and the associated Taylor loss. Experiments on a range of optimizers, layers, and tasks provide evidence that the analysis accurately captures the dynamics of neural optimization. The second half of the paper applies the Taylor approximation to isolate the main difficulty in training rectifier nets – that gradients are shattered – and investigates the hypothesis that, by exploring the space of activation configurations more thoroughly, adaptive optimizers such as RMSProp and Adam are able to converge to better solutions.
منابع مشابه
On the convergence speed of artificial neural networks in the solving of linear systems
Artificial neural networks have the advantages such as learning, adaptation, fault-tolerance, parallelism and generalization. This paper is a scrutiny on the application of diverse learning methods in speed of convergence in neural networks. For this aim, first we introduce a perceptron method based on artificial neural networks which has been applied for solving a non-singula...
متن کاملQuantified advantage of discontinuous weight selection in approximations with deep neural networks
We consider approximations of 1D Lipschitz functions by deep ReLU networks of a fixed width. We prove that without the assumption of continuous weight selection the uniform approximation error is lower than with this assumption at least by a factor logarithmic in the size of the network.
متن کاملCystoscopy Image Classication Using Deep Convolutional Neural Networks
In the past three decades, the use of smart methods in medical diagnostic systems has attractedthe attention of many researchers. However, no smart activity has been provided in the eld ofmedical image processing for diagnosis of bladder cancer through cystoscopy images despite the highprevalence in the world. In this paper, two well-known convolutional neural networks (CNNs) ...
متن کاملError bounds for approximations with deep ReLU networks
We study expressive power of shallow and deep neural networks with piece-wise linear activation functions. We establish new rigorous upper and lower bounds for the network complexity in the setting of approximations in Sobolev spaces. In particular, we prove that deep ReLU networks more efficiently approximate smooth functions than shallow networks. In the case of approximations of 1D Lipschitz...
متن کاملDeep Online Convex Optimization with Gated Games
Methods from convex optimization are widely used as building blocks for deep learning algorithms. However, the reasons for their empirical success are unclear, since modern convolutional networks (convnets), incorporating rectifier units and max-pooling, are neither smooth nor convex. Standard guarantees therefore do not apply. This paper provides the first convergence rates for gradient descen...
متن کامل